LA Water Use by Zipcode


In [53]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [54]:
data = pd.read_csv("Water_Use_Average_By_Zipcode.csv")
data.head()


Out[54]:
FY 05/06 FY 06/07 FY 07/08 FY 08/09 FY 09/10 FY 10/11 FY 11/12 FY 12/13 Location 1
0 20 21 19 18 15 14 15 10 90001\n(33.973271508000494, -118.24896959899968)
1 20 21 19 18 17 16 17 11 90002\n(33.94895251300045, -118.24697958699971)
2 21 20 19 19 18 17 17 10 90003\n(33.96335186100049, -118.27393599999971)
3 41 42 39 38 35 34 34 18 90004\n(34.07572239300049, -118.30301712299968)
4 74 74 67 65 63 63 63 25 90005\n(34.05907233200048, -118.30301712299968)

What's going on in that last column? If those are zip codes and then latitude/longitude, we'll need to clean it up.


In [55]:
data['Location 1']


Out[55]:
0     90001\n(33.973271508000494, -118.24896959899968)
1      90002\n(33.94895251300045, -118.24697958699971)
2      90003\n(33.96335186100049, -118.27393599999971)
3      90004\n(34.07572239300049, -118.30301712299968)
4      90005\n(34.05907233200048, -118.30301712299968)
5       90006\n(34.05010424100044, -118.2926890439997)
6      90007\n(34.02833141800045, -118.28507659499968)
7      90008\n(34.01097312100046, -118.34158773599972)
8      90010\n(34.06139256500046, -118.30969974099969)
9      90011\n(34.00790120200048, -118.25903697699971)
10    90012\n(34.059480762000476, -118.24204800599972)
11     90013\n(34.04726260700045, -118.24609754999972)
12     90015\n(34.04012376300045, -118.26498963499972)
13     90016\n(34.02888323300044, -118.35308896599969)
14     90017\n(34.05230281300044, -118.26338632799968)
...
104      91402\n(34.22575108800049, -118.4489267519997)
105      91403\n(34.15359358100045, -118.4596676059997)
106      91405\n(34.20047184600048, -118.4486964669997)
107    91406\n(34.199811405000446, -118.48688906499967)
108     91411\n(34.17933339400048, -118.45792960399967)
109     91423\n(34.15306349100047, -118.43295885699968)
110      91436\n(34.15384124700046, -118.4873366009997)
111    91504\n(34.198251548000485, -118.33049928199972)
112     91505\n(34.16767140000047, -118.34449888999967)
113    91601\n(34.167132618000494, -118.37252852099971)
114      91602\n(34.15134286900047, -118.3690785869997)
115      91604\n(34.14446038000045, -118.3942796179997)
116     91605\n(34.20546425700047, -118.40193986299971)
117      91606\n(34.18673293900048, -118.3868062089997)
118     91607\n(34.16615064800044, -118.39933720499971)
Name: Location 1, Length: 119, dtype: object

In [56]:
temp = data['Location 1'][0]
temp


Out[56]:
'90001\n(33.973271508000494, -118.24896959899968)'

In [57]:
def cleanZipLocation(messy):
    return messy.split('\n')

print data.iloc[0][-1]
vals = data['Location 1'].apply(cleanZipLocation)
zips = [x[0] for x in vals]
locs = [x[1] for x in vals]
lattitudes = [x.split(',')[0][1:] for x in locs]
longitudes = [x.split(',')[1][0:-1] for x in locs]
data['Zip'] = zips
data['Longitude'] = longitudes
data['Latitude'] = lattitudes
del data['Location 1']
data.head()


90001
(33.973271508000494, -118.24896959899968)
Out[57]:
FY 05/06 FY 06/07 FY 07/08 FY 08/09 FY 09/10 FY 10/11 FY 11/12 FY 12/13 Zip Longitude Latitude
0 20 21 19 18 15 14 15 10 90001 -118.24896959899968 33.973271508000494
1 20 21 19 18 17 16 17 11 90002 -118.24697958699971 33.94895251300045
2 21 20 19 19 18 17 17 10 90003 -118.27393599999971 33.96335186100049
3 41 42 39 38 35 34 34 18 90004 -118.30301712299968 34.07572239300049
4 74 74 67 65 63 63 63 25 90005 -118.30301712299968 34.05907233200048

Much better.


In [74]:
for i in range(len(data)):
    plt.plot([6,7,8,9,10,11,12,13],data.iloc[i][0:-3].values,marker='x')

plt.show()



In [75]:
len(data)


Out[75]:
119

In [78]:
data[data['FY 09/10']>55]


Out[78]:
FY 05/06 FY 06/07 FY 07/08 FY 08/09 FY 09/10 FY 10/11 FY 11/12 FY 12/13 Zip Longitude Latitude
4 74 74 67 65 63 63 63 25 90005 -118.30301712299968 34.05907233200048
8 138 138 129 128 119 120 119 9 90010 -118.30969974099969 34.06139256500046
10 117 119 107 106 101 102 101 8 90012 -118.24204800599972 34.059480762000476
14 160 164 175 176 164 157 154 11 90017 -118.26338632799968 34.05230281300044
17 98 99 93 90 84 82 82 37 90020 -118.30164844599972 34.06636325100044
20 76 76 74 72 63 67 68 28 90024 -118.4402367409997 34.06136215000049
45 102 103 95 95 91 90 91 9 90057 -118.27713826999968 34.06211383600049
57 79 79 81 87 86 90 88 9 90094 -118.42532902799968 33.972932598000455
77 70 79 84 83 73 74 66 11 90744 -118.26266940199969 33.78372064300049

Big water users shown above.


In [79]:
data[data['FY 12/13']>55]


Out[79]:
FY 05/06 FY 06/07 FY 07/08 FY 08/09 FY 09/10 FY 10/11 FY 11/12 FY 12/13 Zip Longitude Latitude
59 37 51 56 55 50 26 42 70 90212 -118.40025834499971 34.063843148000444

This is the blue line identified in the plot, the one that has increased water use over the recent years.


In [83]:
plt.plot(range(6,14),data.iloc[59][0:-3])
plt.title("Use of water in 90212 zip area")
plt.xlabel("Years")


Out[83]:
<matplotlib.text.Text at 0x10b8a7b10>

I went on Google maps, and apparently the 90212 zip code is in Beverly Hills.

Let's just go ahead and show then all individually


In [85]:
plt.figure(figsize=(50,50))
for i in range(len(data)):
    plt.subplot(12,10,i+1)
    plt.plot(range(6,14),data.iloc[i][0:-3])
    plt.ylim([0,180])
    plt.title("Zip: "+str(data.iloc[i]['Zip']))

plt.show()



In [ ]: